Last Quarter’s Review

Published

January 9, 2025

Before we start

  • We are expected to have installed R and RStudio, if not see the installing R section.

  • In the discussion section, we will focus on coding and practicing what we have learned in the lectures.

  • Office hours are on Tuesday, 11-12:30 Scott 110.

  • Questions?

Brief recap of the last quarter

Coding Terminology

Code Chunk

To insert a Code Chunk, you can use Ctrl+Alt+I on Windows and Cmd+Option+I on Mac. Run the whole chunk by clicking the green triangle, or one/multiple lines by using Ctrl + Enter or Command + Return on Mac.

print("Code Chunk")
[1] "Code Chunk"

Function and Arguments

Most of the functions we want to run require an argument For example, the function print() above takes the argument “Code Chunk”.

function(argument)

Data structures

There are many data structures, but the most important to know the following.

  • Objects. Those are individual units, e.g. a number or a word.
number = 1
number

word = "Northwestern"
word
[1] 1
[1] "Northwestern"
  • Vectors. Vectors are collections of objects. To create one, you will need to use function c().
numbers = c(1, 2, 3)
numbers
[1] 1 2 3
  • Dataframes. Dataframes are the most used data structure. Last quarter you spend a lot of time working with it. It is a table with data. Columns are called variables, and those are vectors. You can access a column using $ operator.
df = data.frame(numbers, 
                numbers_multiplied = numbers * 2)
df
df$numbers_multiplied
  numbers numbers_multiplied
1       1                  2
2       2                  4
3       3                  6
[1] 2 4 6

Data classes

We work with various classes of data, and the analysis we perform depends heavily on these classes.

  • Numeric. Continuous data.
numeric_class = c(1.2, 2.5, 7.3)
numeric_class
class(numeric_class)
[1] 1.2 2.5 7.3
[1] "numeric"
  • Integer. Whole numbers (e.g., count data).
integer_class = c(1:3)
class(integer_class)
[1] "integer"
  • character. Usually, represent textual data.
word
[1] "Northwestern"
class(word)
[1] "character"
  • Factor. Categorical variables, where each value is treated as an identifier for a category.
colors = c("blue", "green")
class(colors)
[1] "character"

As you noticed, R did not identify the class of data correctly. We can change it using as.factor() function. You can easily change the class of your variable (as.numeric(), as.integer(), as.character())

colors = as.factor(colors)
class(colors)
[1] "factor"

Libraries

Quite frequently, we will use additional libraries to extend the capabilities of R. I’m sure you remember tidyverse. Let’s load it.

library(tidyverse)

If you updated your R or recently downloaded it, you can easily install libraries using the function install.packages().

Pipes

Pipes (%>% or |>) are helpful for streamlining the coding. They introduce linearity to the process of writing the code. In plain English, a pipe translates to “take an object, and then”.

numbers %>%
  print()
[1] 1 2 3

Base R vs Tidyverse

Useful functions, sample()

Visualizations

Tidyverse basics (mutate, filter, select, summarize, etc) Descriptive statistics Confidence intervals

Statistic Function Example Usage
Minimum min() min(x)
Maximum max() max(x)
Mean mean() mean(x)
Median median() median(x)
Standard Deviation sd() sd(x)
Variance var() var(x)
Sum sum() sum(x)
Summary summary() summary(x)

Helpful to review

Installing R and RStudio

First, we need to install R. Click the button below and click “Download and Install R”. Choose your OS. For Windows you need to download “base”; for MacOS and Linux you have to choose the version of your OS. Install.

Download R
Step one

For windows:

Second, we need to install RStudio. Click the button below and click “Download RStudio Desktop”. You will be redirected to your version automatically. Install.

Download RStudio
Step two